|
In machine learning, sample complexity is the number of examples needed for the estimate of a target function to be within a given error rate. The sample complexity of a machine learning algorithm characterizes its rate of consistency. ==Mathematical Setup== Given a sample drawn i.i.d. from a distribution on some input space , a supervised learning algorithm chooses a function from some hypothesis class . A desirable property of the algorithm is that it chooses functions with small expected prediction error with respect to and some loss function . Specifically, it is desirable to have a consistent algorithm, or an algorithm that generates functions whose expected loss or risk converge to the best possible risk Formally, let be the functions generated by an algorithm as the number of data points grows. The algorithm is consistent if for all , where denotes the i.i.d. sampling measure . The consistency property is nice, but it says nothing about how fast the risks converge. Since in practice one always deals with finite data, it is important to answer the question of how large a sample is needed to achieve a risk that is close, in the sense, to the best possible for the function class. The notion of sample complexity answers this question. The sample complexity of a learning algorithm is a function such that for all , In words, the sample complexity defines the rate of consistency of the algorithm: given a desired accuracy and confidence , one needs to sample at most data points to guarantee that the risk of the output function is within of the best possible, with probability at least . 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Sample complexity」の詳細全文を読む スポンサード リンク
|